Introduction

In biology, cells are not independent entities and they dynamically react to stimuli from their environment, including such from other cells, and we we refer to those as cell-cell communication events (CCC). CCC events are essential for biological processes like apoptosis and cell migration, and are hence essential in homeostasis, development, and disease.

As a consequence of the continuously growing popularity of dissociated and spatial transcriptomics data, CCC inference is becoming a routine approach in transcriptomics data analysis.

CCC commonly refers to interactions between secreted ligands and a corresponding plasma membrane receptor. However, this picture can be broadened to include secreted enzymes, extracellular matrix proteins, transporters, and interactions that require the physical contact between cells, such as cell-cell adhesion proteins and gap junctions. Furtheremore, CCC is not independent of other process, but rather the contrary, as intercellular interactions elicit a downstream responses, including the induction of known pathways or transcription factors.

Set-up the Env

Before we start, we would need to install the LIANA framework packages that we will use: LIANA and NicheNet.

if(!require(liana)) remotes::install_github("saezlab/liana")
## Loading required package: liana
library(tidyverse)
## ── Attaching packages
## ───────────────────────────────────────
## tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6      ✔ purrr   0.3.4 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.1      ✔ stringr 1.4.1 
## ✔ readr   2.1.2      ✔ forcats 0.5.2 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(liana)

Load the single-cell data but now as Seurat object

sobj <- readRDS("data/sobj.RDS")

Ligand-Receptor Inference

We will first explore perhaps the most common and likely simplest methods, i.e. those which aim to prioritize ligand-receptor interactions alone.

There is an ever-growing number of methods aimed at this particular inference task. We recently have harmonized seven of them, along with 16 ligand-receptor resources in the ligand-receptor framework LIANA.

url = "https://media.springernature.com/full/springer-static/image/art%3A10.1038%2Fs41467-022-30755-0/MediaObjects/41467_2022_30755_Fig1_HTML.png?as=webp"

Often many of the ligand-receptor tools come in fixed combinations of method and resource, and we will quickly run a couple of them via LIANA. Namely CellPhoneDB and NATMI, each with it’s corresponding resource.

As is commonly the case, we will apply those methods to predict interactions from steady-state, or single-context data. More specifically, we will focus on inferring the interactions thought to be most relevant among all cell types.

This, we need to first subsample the Seurat object with COVID-19 infected blood cells alone.

sobj_covid <- sobj[, sobj@meta.data$condition=="covid"]
# Remove erythroid cells
sobj_covid <- sobj_covid[, !str_detect(sobj_covid@meta.data$cell_type, "erythroid|Plasma")]
sobj_covid@meta.data$cell_type <- as.factor(sobj_covid@meta.data$cell_type)

NATMI

We will first run NATMI, a method which in single-context data, proposes an example of a score which aims at inferring interaction specificity. This is done under the assumption that interactions specific to a given pair of cell types are more interesting than those shared between all cell types.

Q: Is this logic flawed? If you think it is, then is differential analysis between cell types also flawed?

NATMI’s specificity edge (Adapted from Burmedi et al 2021, Unpublished)

natmi_res <- liana_wrap(sobj_covid,
                        method = "natmi",
                        resource="connectomeDB2020") %>%
  rank_method(method_name = "natmi",
              mode = "specificity")
## Expression from the `RNA` assay will be used
## Running LIANA with `colLabels`/`Idents` as labels (matching column in metadata not found).
## Warning in exec(output, ...): 330 genes and/or 0 cells were removed as they had
## no counts!
## Warning: `invoke()` is deprecated as of rlang 0.4.0.
## Please use `exec()` or `inject()` instead.
## This warning is displayed once per session.
## LIANA: LR summary stats calculated!
## Now Running: Natmi

Plot the Results

natmi_res %>%
  liana_dotplot(target_groups = c("Monocytes", "T cells", "B cells naive"),
                specificity ="edge_specificity",
                magnitude = "prod_weight",
                ntop = 20) +
  theme(axis.text.x = element_text(angle = 90, 
                                   vjust = 0.5,
                                   hjust= 0.5))

Run CellPhoneDB

CellPhoneDB(v2) uses a permutation-based approach which reshuffles cell labels in order to generate a Null distribution. This distribution is then used to estimate p-values that represent the specificity of the interactions across all cell types.

Hence, a typical usage of CellPhoneDB in a steady-state data would imply that one can focus on the mean expression between a ligand and receptor, subsequent to filtering according to a significance threshold.

cpdb_res <- liana_wrap(sobj_covid,
                       method = "cellphonedb",
                       resource="CellPhoneDB",
                       # lower the number of permutations
                       permutation.params = list(nperms = 100)
                       )
## Expression from the `RNA` assay will be used
## Running LIANA with `colLabels`/`Idents` as labels (matching column in metadata not found).
## Warning in exec(output, ...): 330 genes and/or 0 cells were removed as they had
## no counts!
## LIANA: LR summary stats calculated!
## Now Running: Cellphonedb

Plot Results

# identify interactions of interest
cpdb_int <- cpdb_res %>%
  # only keep interactions with p-val <= 0.05
  filter(pvalue <= 0.05) %>% # this reflects interactions `specificity`
  # then rank according to `magnitude` (`lr_mean` in this case)
  rank_method(method_name = "cellphonedb",
              mode = "magnitude") %>%
  # keep top 20 interactions (regardless of cell type)
  distinct_at(c("ligand.complex", "receptor.complex")) %>%
  head(20)

# Plot results
cpdb_res %>%
  # keep only the interactions of interest
  inner_join(cpdb_int, 
             by = c("ligand.complex",
                    "receptor.complex")
             ) %>%
  # invert size (low p-value/high specificity = larger dot size)
  # + add a small value to avoid Infinity for 0s
  mutate(pvalue = -log10(pvalue + 1e-10)) %>% 
  liana_dotplot(target_groups = c("Monocytes", "T cells", "B cells naive"),
                specificity = "pvalue",
                magnitude = "lr.mean",
                show_complex = TRUE,
                size.label = "-log10(p-value)") +
  theme(axis.text.x = element_text(angle = 90, 
                                   vjust = 0.5,
                                   hjust= 0.5))

Note the “_” - this is meant to separate subunits of heteromeric complexes. Q: Why could this be important?

We also note that there is little agreement between our two runs, even though despite their different algorithms, resources, and assumptions, both aim at predicting the most relevant interactions.

In a recent publication, we saw that both the method and the resource of choice could affect the predictions that we get Dimitrov el al., 2022.

Q: Does that mean that we cannot trust CCC predictions? (Not necessarily!)

Using LIANA as a Consensus

Finally, we will use LIANA to generate a consensus score from multiple methods, and we will use a Consensus resource, which is built from multiple literature-curated ligand-receptor resources.

liana_res <- liana_wrap(sobj_covid,
                        resource = "Consensus",
                        verbose = FALSE,
                        # lower perms
                        permutation.params = list(nperms = 100)) %>%
  # aggregate the scores from each method
  liana_aggregate()
## Expression from the `RNA` assay will be used
## Now aggregating natmi
## Now aggregating connectome
## Now aggregating logfc
## Now aggregating sca
## Now aggregating cellphonedb
## Aggregating Ranks
# these are liana's default parameters, so one can also just run:
# liana_res <- liana_wrap(sobj_covid)

We will now plot the results one last time.

By default, we use SingleCellSignalR’s magnitude of expression of the ligand and receptor, and NATMI’s specificity weights to show how specific a given interaction is to the source(L) and target(R) cell types.

Also, the ligand-receptor interactions are now ordered by a consensus scores integrating all methods implemented in LIANA, calculated with robust rank aggregate.

liana_res %>%
  liana_dotplot(target_groups = c("Monocytes", "T cells", "B cells naive"),
                ntop = 20) +
  theme(axis.text.x = element_text(angle = 90, 
                                   vjust = 0.5,
                                   hjust= 0.5))

NicheNet

Load and Install NicheNet

if(!require(nichenetr)) remotes::install_github("saeyslab/nichenetr")
## Loading required package: nichenetr
library(nichenetr)

NicheNet: studying cell-cell communication by bridging ligand-receptor infence tools and footprint approaches

NicheNet is a second type of approach to study cell-cell communication from scRNA-seq data. The first type of tools described here above are useful to get an extensive list of expressed / cell-type specific interactions between the different cell types in your data. However, the output is typically a very large list of possible ligand-receptor interactions between all cell types (cf output table of CellphoneDB, NATMI, Liana,…). Consequently, it is not immediately clear what the most important interactions are.

NicheNet (paper: https://www.nature.com/articles/s41592-019-0667-5; github: https://github.com/saeyslab/nichenetr) is a cell-cell communication algorithm that was developed to help prioritizing ligand-receptor interactions for further experimental validation. It does this by considering the signaling effect that these ligand-receptor interactions have. When a certain cell type (“sender cell type”) produces a ligand that can bind to a receptor on the membrane of another cell type (“receiver cell type”), signal transduction pathways are activated, leading to changes in gene expression. The genes of which expression changes in response to a ligand-receptor interaction (“target genes”) are typically specific for each ligand (family).

NicheNet uses this information then to look for whether there are target genes of expressed ligand-receptor pairs enriched in the receiver cell type after the cell-cell communication process. NicheNet thus ranks / prioritizes ligands based on the gene regulatory effect (“ligand activity”) they might have.





To do this, NicheNet integrates prior knowledge of ligand-receptor, signaling and gene regulatory interactions to predict which ligands might regulate which target genes. This type of knowledge is then summarized in a prior model of ligand-target regulatory potential. This model is used in combination with expression data of your interacting cells to predict active ligands, and the specific target genes induced by them.





You might have noticed already the conceptual similarity between this NicheNet ligand activity prediction procedure other forms of footprint analyses, such as those implemented in declouper! Instead of pathway-gene associations, or TF-gene associations, NicheNet uses ligand-gene assocations: based on prior knowledge we can predict which genes might be regulated by which ligands – and based on these associations NicheNet looks for enrichment in your data. It is most similar to the CytoSig Cytokine Activity prediction. The main difference with the CytoSig Cytokine Activity prediction is that NicheNet has ligand-gene associations for all ligands in the ligand-receptor database, whereas CytoSig provides this only for a limited number of Cytokines. This is because CytoSig is only based on experimental data: genes that are DE after stimulating cells in vitro are considered to be genes regulated by the cytokine/ligand. NicheNet uses this information as well, but makes also predictions based on other types of prior knowledge. Another difference is that NicheNet uses a probability model of target genes regulated by ligands, and that it does not provide a discrete model of which genes are a target, and which not (in contrast to most footprint approaches).

So to conclude, NicheNet combines the principles of ligand-receptor prediction tools with footprint prediction tools in one framework to study cell-cell interactions. It predicts 1) which ligands from one or more cell population(s) (“sender/niche”) are most likely to affect target gene expression in an interacting cell population (“receiver/target”) and 2) which specific target genes are affected by each of these prioritized ligands.

One important final note before we go to applying NicheNet to our dataset. Because NicheNet studies how ligands affect gene expression in putatively neighboring/interacting cells, you need to have data about this effect in gene expression you want to study. So, there need to be ‘some kind of’ differential expression in a receiver cell population, caused by ligands from one of more interacting sender cell populations. In other words: you need to be able to define which genes in a receiver cell type are likely affected by cell-cell communication processes. Ideally, you have two or more conditions in your data, and gene expression differences can be attributed to differences in cell-cell communication processes (treatment vs control, knock-out vs wild type, etc.). In this tutorial, this requirement is fulfilled: we can compare communication between cells in COVID-19 and healthy patients.

NicheNet application: predict driving interactions in COVID-19

! Note: in contrast to LIANA: NicheNet uses sender and receiver cells as terminology instead of source and target.

The pipeline of a basic NicheNet analysis consist mainly of the following steps:

  • 0.) Load in the prior knowledge models of NicheNet: it’s ligand-receptor network and ligand-target matrix
  • 1.) Define which cell types you want to consider as senders, and which cell type as receiver
  • 2.) Define a set of ligands that **potentially affect* your receiver population: these are ligands that are expressed by the sender cell population(s) and bind a (putative) receptor expressed by the receiver population
  • 3.) Define a gene set of interest: these are the genes in the receiver” cell population that are potentially affected by ligands expressed by interacting cells (e.g. genes differentially expressed upon cell-cell interaction)
    1. Perform NicheNet ligand activity analysis: rank the potential ligands based on the presence of their target genes in the gene set of interest (compared to a background set of genes)
    1. Infer top-predicted target genes of ligands that are top-ranked in the ligand activity analysis

This vignette guides you in detail through all these steps.

NicheNet application without using LIANA’s output

Step 0: Load in the prior knowledge models of NicheNet: it’s ligand-receptor network and ligand-target matrix

As mention in the introduction here, NicheNet predicts important cell-cell communication processes through integration of known ligand-receptor, signaling and gene regulatory interactions. We very recently updated these networks, so we now have a NicheNet-v2 prior knowledge model (for human and mouse). You as participant of this workshop can already download these prior knowledge networks from Zenodo DOI. If the connection to Zenodo would fail, you can find the objects required for this script also in the “nichenet_input” folder.

A first object we will read in is the ligand_target_matrix: this matrix denotes the potential (based on prior knowledge) that a ligand might regulate the expression of a target genes, for all possible ligand-target gene pairs. This matrix is necessary to prioritize possible ligand-receptor interactions based on observed gene expression effects (i.e. NicheNet’s ligand activity analysis) and infer affected target genes of these prioritized ligands.

A second object is the lr_network, which is the database of ligand-receptor interactions that we will use to define expressed ligands, receptors and their interactions.

A third object is the weighted_networks: this object contains the integrated ligand-receptor, signaling and gene-regulatory interactions.

These different objects will be given as input to the NicheNet function later on.

# ligand_target_matrix <- readRDS(url("https://zenodo.org/record/7074291/files/ligand_target_matrix_nsga2r_final.rds"))
ligand_target_matrix <- readRDS("data/nichenet_input/ligand_target_matrix_nsga2r_final.rds") # run this in case the connection to Zenodo does not work ("cannot open URL 'https://zenodo.org/record/3260758/files/ligand_target_matrix.rds': HTTP status was '502 Bad Gateway'")

ligand_target_matrix[1:5,1:5] # target genes in rows, ligands in columns
##                     A2M        AANAT        ABCA1          ACE        ACE2
## A-GAMMA3'E 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.000000000
## A1BG       0.0018503922 0.0011108718 0.0014225077 0.0028594037 0.001139013
## A1BG-AS1   0.0007400797 0.0004677614 0.0005193137 0.0007836698 0.000375007
## A1CF       0.0024799266 0.0013026348 0.0020420890 0.0047921048 0.003273375
## A2M        0.0084693452 0.0040689323 0.0064256379 0.0105191365 0.005719199
# lr_network <- readRDS(url("https://zenodo.org/record/7074291/files/lr_network_human_21122021.rds"))
lr_network <- readRDS("data/nichenet_input/lr_network_human_21122021.rds") # run this in case the connection to Zenodo does not work

head(lr_network)
## # A tibble: 6 × 4
##   from  to     database             source              
##   <chr> <chr>  <chr>                <chr>               
## 1 A2M   MMP2   omnipath             omnipath            
## 2 A2M   MMP9   omnipath             omnipath            
## 3 A2M   LRP1   omnipath             omnipath            
## 4 A2M   KLK3   omnipath             omnipath            
## 5 AANAT MTNR1A nichenet_verschueren nichenet_verschueren
## 6 AANAT MTNR1B nichenet_verschueren nichenet_verschueren
# weighted_networks <- readRDS(url("https://zenodo.org/record/7074291/files/weighted_networks_nsga2r_final.rds"))
weighted_networks <- readRDS("data/nichenet_input/weighted_networks_nsga2r_final.rds") # run this in case the connection to Zenodo does not work
head(weighted_networks$lr_sig) # interactions and their weights in the ligand-receptor + signaling network
## # A tibble: 6 × 3
##   from       to         weight
##   <chr>      <chr>       <dbl>
## 1 A-GAMMA3'E ACTG1P11   0.100 
## 2 A-GAMMA3'E AXIN2      0.0869
## 3 A-GAMMA3'E BUB1B-PAK6 0.0932
## 4 A-GAMMA3'E CEACAM7    0.0793
## 5 A-GAMMA3'E CHRNA1     0.0901
## 6 A-GAMMA3'E DTX2P1     0.0976
head(weighted_networks$gr) # interactions and their weights in the gene regulatory network
## # A tibble: 6 × 3
##   from  to     weight
##   <chr> <chr>   <dbl>
## 1 A1BG  A2M    0.165 
## 2 AAAS  GFAP   0.0906
## 3 AADAC CTAG1B 0.104 
## 4 AADAC CYP3A4 0.177 
## 5 AADAC DIRAS3 0.0936
## 6 AADAC IRF8   0.0892

Step 1: Define which cell types you want to consider as senders, and which cell type as receiver

NicheNet predicts the ligands most likely be regulating expression in a receiver cell type of interest. Therefore, we will need to define one receiver cell type per NicheNet analysis.

receiver <- "Monocytes"

As potential senders, we can take every cell type of interest:

sender_celltypes <- c("Monocytes","Neutrophils", "T cells", "Dendritic cells", "NK cells", "Platelets", "B cells naive")

Step 2: Define a set of ligands that **potentially affect* your receiver population: these are ligands that are expressed by the sender cell population(s) and bind a (putative) receptor expressed by the receiver population

First, defines which genes are expressed in receiver and senders. In the default way described in the NicheNet vignettes, we look at the fraction of cells in a cell type that express a gene, and we consider genes to be expressed if the gene is expressed by a sufficiently large fraction of cells (eg 5% or 10%). In this tutorial, we put this threshold here at 5%.

expressed_genes_receiver <- get_expressed_genes(receiver, sobj, pct = 0.05)
list_expressed_genes_sender <- sender_celltypes %>% unique() %>% lapply(get_expressed_genes, sobj, 0.05) # lapply to get the expressed genes of every sender cell type separately here
expressed_genes_sender <- list_expressed_genes_sender %>% unlist() %>% unique()

Now filter out the ligands and receptors from the database to only include expressed ligand-receptor pairs.

## receiver
ligands <- lr_network %>% pull(from) %>% unique()
receptors <- lr_network %>% pull(to) %>% unique()

expressed_ligands <- intersect(ligands,expressed_genes_sender)
expressed_receptors <- intersect(receptors,expressed_genes_receiver)

potential_ligands <- lr_network %>% filter(from %in% expressed_ligands & to %in% expressed_receptors) %>% pull(from) %>% unique()

Note: if you don’t have information about sender cell types, you can choose all the ligands in the NicheNet database as potential_ligands. This is dan similar to the CytoSig Cytokine Activity prediction, but with a filter on receptors that should be expressed by the recever.

Step 3: Define a gene set of interest: these are the genes in the receiver” cell population that are potentially affected by ligands expressed by interacting cells (e.g. genes differentially expressed upon cell-cell interaction)

In this step, we will use the DE results generated during the decoupler vignette, and use the same thresholds to define the DE genes in the receiver cell population.

DE_table <- readr::read_delim("data/deg.csv",delim = ",")
## Rows: 44190 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): contrast, name
## dbl (3): logFCs, pvals, adj_pvals
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
geneset_oi <- DE_table %>%
  filter(contrast == receiver) %>%
  filter(pvals < 0.05 & abs(logFCs) > 0.50) %>%
  pull(name) %>%
  unique()

For the ligand acivity analysis, we also need to define the genomic background, which we choose here to be all genes in the receiver cell type considered for the DE analysis.

background_expressed_genes <- DE_table %>% filter(contrast == receiver) %>% pull(name) %>% unique()

Both our geneset of interest and background should only contain genes for which we have prior knowledge (so; that is part of our ligand-target matrix)

print(setdiff(geneset_oi, rownames(ligand_target_matrix))) # which genes will not be considered
##  [1] "SINHCAF-1"  "RPS26P21"   "ARHGAP27-2" "HLA-DQA1-3" "CANX-1"    
##  [6] "FRG1-1"     "BDP1-2"     "FCAR-5"     "ABCC1-1"    "DDX39B-2"  
## [11] "FCAR-4"     "PTEN-1"     "DDX39B-5"   "NAP1L4-1"   "DDX39B-4"  
## [16] "LILRB1-1"   "LONRF1-1"   "TAPBP-1"    "TAPBP-3"    "RPS17-1"   
## [21] "POLR1H-6"   "GNL1-6"     "RPS18-2"    "FCAR-2"     "HLA-DMB-3"
geneset_oi <- geneset_oi %>% .[. %in% rownames(ligand_target_matrix)]
background_expressed_genes <- background_expressed_genes %>% .[. %in% rownames(ligand_target_matrix)]
print(length(geneset_oi))
## [1] 250
print(length(background_expressed_genes))
## [1] 6899

Note: we want to have good ratio of geneset of interest vs background (50-1000 genes vs 5000-20000 genes). This is here the case.

Step 4: Perform NicheNet ligand activity analysis: rank the potential ligands based on the presence of their target genes in the gene set of interest (compared to the background set of genes)

To calculate ligand activities, we do the following: we will assess how well each ligand can predict which genes belong to the geneset of interest versus the background. Or in other words, we will assess whether genes with higher regulatory potential to be regulated by a ligand, are more likely to belong the geneset of interest. We assume that for active ligands, the target genes with high potential should be enriched in the geneset of interest. Mathematically we do the following: calculating the pearson correlation between the target gene regulatory potential vector of a ligand, and the vector that indicates whether a gene belongs to the geneset or not. Other metrics that are also calculated are the area under the precision-recall curve (aupr) and receiver operating characteristic curve (auroc). In our validation study, we showed that the pearson correlation coefficient was the most predictive measure of ligand activity for NicheNet-v1, but the AUPR for NicheNet-v2. If you are using the NicheNet-v2 networks and models as in this tutorial, we best use the AUPR to rank our ligands. (for the NicheNet-v1 model as currently shown in the github vignettes, this is thus based on the pearson correlation)

ligand_activities <- predict_ligand_activities(geneset = geneset_oi, background_expressed_genes = background_expressed_genes, ligand_target_matrix = ligand_target_matrix, potential_ligands = potential_ligands)

ligand_activities <- ligand_activities %>% arrange(-aupr) %>% mutate(rank = rank(desc(aupr)))

ligand_activities
## # A tibble: 334 × 6
##    test_ligand auroc   aupr aupr_corrected pearson  rank
##    <chr>       <dbl>  <dbl>          <dbl>   <dbl> <dbl>
##  1 TGFB1       0.547 0.0472        0.0110   0.0299     1
##  2 IL10        0.541 0.0468        0.0105   0.0377     2
##  3 OSM         0.532 0.0466        0.0104   0.0348     3
##  4 CD82        0.514 0.0438        0.00752  0.0468     4
##  5 B2M         0.519 0.0436        0.00733  0.0358     5
##  6 LEFTY1      0.540 0.0435        0.00721  0.0319     6
##  7 HGF         0.528 0.0433        0.00705  0.0322     7
##  8 NAALADL1    0.546 0.0429        0.00666  0.0340     8
##  9 GDF11       0.531 0.0428        0.00654  0.0228     9
## 10 BMP6        0.529 0.0427        0.00645  0.0286    10
## # … with 324 more rows

NicheNet just ranks ligands during the ligand activity procedure. To check by which sender cell types these ligands are expressed, you can run:

best_upstream_ligands <- ligand_activities %>% top_n(30, aupr) %>% arrange(-aupr) %>% pull(test_ligand) %>% unique()

Seurat::DotPlot(sobj, features = best_upstream_ligands, cols = "RdYlBu") + Seurat::RotatedAxis()

Step 5: Infer top-predicted target genes and receptors of ligands that are top-ranked in the ligand activity analysis

active_ligand_target_links_df <- best_upstream_ligands %>% lapply(get_weighted_ligand_target_links,geneset = geneset_oi, ligand_target_matrix = ligand_target_matrix, n = 500) %>% bind_rows() %>% drop_na()

active_ligand_target_links <- prepare_ligand_target_visualization(ligand_target_df = active_ligand_target_links_df, ligand_target_matrix = ligand_target_matrix, cutoff = 0.33)

order_ligands <- intersect(best_upstream_ligands, colnames(active_ligand_target_links)) %>% rev() %>% make.names()
order_targets <- active_ligand_target_links_df$target %>% unique() %>% intersect(rownames(active_ligand_target_links)) %>% make.names()
rownames(active_ligand_target_links) <- rownames(active_ligand_target_links) %>% make.names() # make.names() for heatmap visualization of genes like H2-T23
colnames(active_ligand_target_links) <- colnames(active_ligand_target_links) %>% make.names() # make.names() for heatmap visualization of genes like H2-T23

vis_ligand_target <- active_ligand_target_links[order_targets,order_ligands] %>% t()
p_ligand_target_network <-  vis_ligand_target %>% make_heatmap_ggplot("Prioritized ligands","Predicted target genes", color = "purple",legend_position = "top", x_axis_position = "top",legend_title = "Regulatory potential")  + theme(axis.text.x = element_text(face = "italic")) + scale_fill_gradient2(low = "whitesmoke",  high = "purple", breaks = c(0,0.0045,0.0090))
## Scale for 'fill' is already present. Adding another scale for 'fill', which
## will replace the existing scale.
p_ligand_target_network

Now visualise the receptors of these ligands:

weighted_networks_lr <-  weighted_networks$lr_sig %>% inner_join(lr_network %>% distinct(from,to), by = c("from","to"))

lr_network_top <- lr_network %>% filter(from %in% best_upstream_ligands & to %in% expressed_receptors) %>% distinct(from,to)
best_upstream_receptors <-  lr_network_top %>% pull(to) %>% unique()

lr_network_top_df_large <-  weighted_networks_lr %>% filter(from %in% best_upstream_ligands & to %in% best_upstream_receptors)

lr_network_top_df <-  lr_network_top_df_large %>% spread("from","weight",fill = 0)
lr_network_top_matrix <-  lr_network_top_df %>% select(-to) %>% as.matrix() %>% magrittr::set_rownames(lr_network_top_df$to)

dist_receptors <- dist(lr_network_top_matrix, method = "binary")
hclust_receptors <-  hclust(dist_receptors, method = "ward.D2")
order_receptors <-  hclust_receptors$labels[hclust_receptors$order]
    
dist_ligands <-  dist(lr_network_top_matrix %>% t(), method = "binary")
hclust_ligands <-  hclust(dist_ligands, method = "ward.D2")
order_ligands_receptor <-  hclust_ligands$labels[hclust_ligands$order]

order_receptors <- order_receptors %>% intersect(rownames(lr_network_top_matrix))
order_ligands_receptor <-  order_ligands_receptor %>% intersect(colnames(lr_network_top_matrix))

vis_ligand_receptor_network <-  lr_network_top_matrix[order_receptors, order_ligands_receptor]
rownames(vis_ligand_receptor_network) <-  order_receptors %>% make.names()
colnames(vis_ligand_receptor_network) <-  order_ligands_receptor %>% make.names()
p_ligand_receptor_network <- vis_ligand_receptor_network %>% t() %>%
  make_heatmap_ggplot("Ligands","Receptors", color = "mediumvioletred",
                      x_axis_position = "top",legend_title = "Prior interaction potential")
p_ligand_receptor_network

NicheNet application with using LIANA’s output

from Liana’s output

In contrast to Liana, NicheNet ligand-receptor interactions are inferred when the ligand and receptor are just sufficiently expressed in the sender and the receiver. However, this can be prone to introduce some false positives. Therefore we can also limit the NicheNet ligand activity analysis to only those ligands that are part of cell-type specific interactions according to Liana.

This is what we will show now. We just need to redefine the potential_ligands, rerun the NicheNet ligand activity analysis and the downstream visualizations.

potential_ligands_liana <- 
  liana_res %>% 
  filter(target == receiver) %>%
  separate_rows(ligand.complex, sep="[_]") %>%
  pull(ligand.complex) %>%
  unique() %>%
  intersect(colnames(ligand_target_matrix)) # only keep common ligands

Compare the list of potential ligands from the default NicheNet analysis to the one based on LIANA

potential_ligands_liana %>% setdiff(potential_ligands) # unique for liana's ligands
## [1] "IL18"     "TNFSF13B" "PTGS2"    "ADM"      "CD24"     "POMC"     "IAPP"
potential_ligands %>% setdiff(potential_ligands_liana) # unique for NicheNet's ligands
##   [1] "A2M"       "ADGRE5"    "ALCAM"     "ALOX5AP"   "ANG"       "ANGPT1"   
##   [7] "ANGPT2"    "APOA1"     "AREG"      "BACE1"     "BACE2"     "BTLA"     
##  [13] "C3"        "CCL2"      "CCL28"     "CCL3L3"    "CCL4L2"    "CD160"    
##  [19] "CD2"       "CD226"     "CD244"     "CD300C"    "CD300LF"   "CD320"    
##  [25] "CD6"       "CD7"       "CD72"      "CD79B"     "CD82"      "CD9"      
##  [31] "CDH3"      "CEACAM3"   "CEACAM4"   "CEACAM5"   "CIB1"      "CKLF"     
##  [37] "CLDN16"    "CNTN2"     "CNTN4"     "COL18A1"   "COL27A1"   "COL4A3"   
##  [43] "COL6A2"    "COL6A3"    "COL9A2"    "CRTAM"     "CSF1"      "CTSD"     
##  [49] "CXADR"     "CXCL16"    "CXCL3"     "CXCL5"     "DUSP18"    "EDN1"     
##  [55] "EFNB1"     "ENG"       "EPHA4"     "EREG"      "EZR"       "FASLG"    
##  [61] "FCAR"      "FCN1"      "FCRL6"     "FGB"       "FGF2"      "FGF22"    
##  [67] "FLRT2"     "FLT3LG"    "FURIN"     "GDF11"     "GGT1"      "GLG1"     
##  [73] "HEBP1"     "HGF"       "HLA-DMA"   "HLA-DOA"   "HLA-DRB5"  "HP"       
##  [79] "IFITM1"    "IGFBP7"    "IL12A"     "IL1RN"     "IL23A"     "IL32"     
##  [85] "IL6"       "ITGAL"     "ITGAM"     "JAG1"      "JAML"      "KIAA0319L"
##  [91] "KLRA1P"    "LAMA1"     "LEFTY1"    "LFNG"      "LILRA6"    "LILRB2"   
##  [97] "LRIG1"     "LRRC4"     "LY6G5C"    "LY6G6F"    "MFAP3L"    "MFGE8"    
## [103] "MICB"      "MMP1"      "MPZL3"     "MSN"       "NAALADL1"  "NETO2"    
## [109] "NFASC"     "NTNG2"     "OCLN"      "PCDHA4"    "PDGFA"     "PDGFC"    
## [115] "PECAM1"    "PGF"       "PNOC"      "POMZP3"    "PPBP"      "PTPRC"    
## [121] "PVR"       "S100B"     "SARAF"     "SECTM1"    "SELL"      "SELP"     
## [127] "SEMA4F"    "SEMA5B"    "SEMA7A"    "SERPINF1"  "SFTPD"     "SIGLEC14" 
## [133] "SIGLEC5"   "SIGLEC6"   "SIGLEC7"   "SIGLEC9"   "SLAMF7"    "SPON1"    
## [139] "SPP1"      "ST14"      "STX1A"     "SYT1"      "TCTN1"     "TGFBI"    
## [145] "TGM2"      "TGOLN2"    "TNFRSF14"  "TRAF2"     "TSPAN3"    "TSPAN33"  
## [151] "TYROBP"    "ULBP3"     "UTS2"      "VEGFB"     "VEGFC"     "VSTM1"    
## [157] "VSTM4"     "WNT10A"    "WNT2B"     "ZG16B"
ligand_activities <- predict_ligand_activities(geneset = geneset_oi,
                                               background_expressed_genes = background_expressed_genes,
                                               ligand_target_matrix = ligand_target_matrix,
                                               potential_ligands = potential_ligands_liana)

ligand_activities <- ligand_activities %>%
  arrange(-aupr) %>%
  mutate(rank = rank(desc(aupr)))

ligand_activities
## # A tibble: 181 × 6
##    test_ligand auroc   aupr aupr_corrected pearson  rank
##    <chr>       <dbl>  <dbl>          <dbl>   <dbl> <dbl>
##  1 TGFB1       0.547 0.0472        0.0110   0.0299     1
##  2 IL10        0.541 0.0468        0.0105   0.0377     2
##  3 OSM         0.532 0.0466        0.0104   0.0348     3
##  4 B2M         0.519 0.0436        0.00733  0.0358     4
##  5 BMP6        0.529 0.0427        0.00645  0.0286     5
##  6 HBEGF       0.533 0.0424        0.00615  0.0238     6
##  7 NRG1        0.523 0.0419        0.00565  0.0289     7
##  8 CFH         0.531 0.0419        0.00563  0.0278     8
##  9 HLA-DOB     0.531 0.0414        0.00512  0.0230     9
## 10 TNF         0.522 0.0413        0.00510  0.0203    10
## # … with 171 more rows

NicheNet just ranks ligands during the ligand activity procedure. To check by which sender cell types these ligands are expressed, you can run:

best_upstream_ligands_liana <- ligand_activities %>% top_n(30, aupr) %>% arrange(-aupr) %>% pull(test_ligand) %>% unique()

best_upstream_ligands_liana %>% setdiff(best_upstream_ligands) # unique for liana's ligands
##  [1] "RGMA"     "IL15"     "FN1"      "FAM3C"    "BMP7"     "ANXA1"   
##  [7] "HLA-DQA1" "HLA-C"    "CALM1"    "COPA"     "VWF"      "THBS1"   
## [13] "SPINT1"   "PSAP"     "F13A1"    "HLA-B"
best_upstream_ligands %>% setdiff(best_upstream_ligands_liana) # unique for NicheNet's ligands
##  [1] "CD82"     "LEFTY1"   "HGF"      "NAALADL1" "GDF11"    "GLG1"    
##  [7] "CSF1"     "COL9A2"   "FCRL6"    "CD2"      "OCLN"     "ENG"     
## [13] "CNTN2"    "SPP1"     "CNTN4"    "JAG1"
Seurat::DotPlot(sobj, features = best_upstream_ligands_liana, cols = "RdYlBu") + Seurat::RotatedAxis()

active_ligand_target_links_df <- best_upstream_ligands_liana %>%
  lapply(get_weighted_ligand_target_links,geneset = geneset_oi,
         ligand_target_matrix = ligand_target_matrix, n = 500) %>%
  bind_rows() %>%
  drop_na()

active_ligand_target_links <- prepare_ligand_target_visualization(ligand_target_df = active_ligand_target_links_df, ligand_target_matrix = ligand_target_matrix, cutoff = 0.33)

order_ligands <- intersect(best_upstream_ligands_liana, colnames(active_ligand_target_links)) %>% rev() %>% make.names()
order_targets <- active_ligand_target_links_df$target %>% unique() %>% intersect(rownames(active_ligand_target_links)) %>% make.names()
rownames(active_ligand_target_links) <- rownames(active_ligand_target_links) %>% make.names() # make.names() for heatmap visualization of genes like H2-T23
colnames(active_ligand_target_links) <- colnames(active_ligand_target_links) %>% make.names() # make.names() for heatmap visualization of genes like H2-T23

vis_ligand_target <- active_ligand_target_links[order_targets,order_ligands] %>% t()
p_ligand_target_network <- vis_ligand_target %>% make_heatmap_ggplot("Prioritized ligands","Predicted target genes", color = "purple",legend_position = "top", x_axis_position = "top",legend_title = "Regulatory potential")  + theme(axis.text.x = element_text(face = "italic")) + scale_fill_gradient2(low = "whitesmoke",  high = "purple", breaks = c(0,0.0045,0.0090))
## Scale for 'fill' is already present. Adding another scale for 'fill', which
## will replace the existing scale.
p_ligand_target_network

In a next step, we will add the ligand activities to the Liana output table.

liana_nichenet <- liana_res %>% filter(target == receiver) %>% left_join(ligand_activities %>% rename(ligand.complex = test_ligand, rank_Nichenet = rank))
## Joining, by = "ligand.complex"

We will now visualize the top interactions prioritized by both methods.

First, we will show LIANA’s top 50 LR interactions towards the receiver cell type assessed in NicheNet, that are also supported with NicheNet prioritization (e.g. in top 30 of NicheNet’s ligand activities)

liana_nichenet %>% filter(target == receiver & rank_Nichenet <= 30) %>%
  liana_dotplot(target_groups = receiver,
                ntop = 50) +
  theme(axis.text.x = element_text(angle = 90, 
                                   vjust = 0.5,
                                   hjust= 0.5))

Now we will show the opposite type of plot: show the NicheNet top 30 ligands, within their top 2 scoring pairs according to LIANA

liana_nichenet %>% filter(target == receiver & rank_Nichenet <= 30) %>% group_by(ligand.complex) %>% top_n(2, desc(aggregate_rank)) %>%
  liana_dotplot() +
  theme(axis.text.x = element_text(angle = 90, 
                                   vjust = 0.5,
                                   hjust= 0.5))

Note: the prioritization by Liana is made based on specificity, by NicheNet based on ligand activity. None of these two approaches ranks ligands based on their differential expression between COVID and normal, which could also be an interesting aspect to use for prioritization. Therefore, we will also add the DE information (covid vs normal) for the sender cell and receiver cell types because we expect that some of these active ligands might be upregulated (or their receptor(s)). We can thus use this information for even further filtering.

liana_nichenet <- liana_nichenet %>%
  left_join(DE_table %>% rename(source = contrast, ligand.complex = name, logFC_ligand = logFCs, pval_ligand = pvals)) %>%
  left_join(DE_table %>% rename(target = contrast, receptor.complex = name, logFC_receptor = logFCs, pval_receptor = pvals))
## Joining, by = c("source", "ligand.complex")
## Joining, by = c("target", "receptor.complex", "adj_pvals")
liana_nichenet %>% filter(logFC_ligand >= 0.5 |  logFC_receptor >= 0.5) %>% filter(rank_Nichenet <= 30)
## # A tibble: 72 × 26
##    source target ligan…¹ recep…² aggre…³ mean_…⁴ natmi…⁵ natmi…⁶ conne…⁷ conne…⁸
##    <chr>  <chr>  <chr>   <chr>     <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
##  1 Neutr… Monoc… HLA-B   LILRB1  0.00124    537.  0.0962    1360   1.10      475
##  2 Neutr… Monoc… HLA-B   LILRB2  0.00162    754.  0.0934    1434   0.871     826
##  3 Monoc… Monoc… HLA-B   LILRB1  0.00259    804   0.0878    1575   0.980     640
##  4 Neutr… Monoc… B2M     LILRB1  0.00292    643.  0.0866    1613   0.981     638
##  5 Neutr… Monoc… B2M     LILRB2  0.00380    909.  0.0840    1700   0.748    1077
##  6 NK ce… Monoc… HLA-B   LILRB1  0.0208     562.  0.102     1261   1.18      365
##  7 NK ce… Monoc… B2M     LILRB1  0.0268     568   0.0966    1348   1.17      378
##  8 T cel… Monoc… HLA-B   LILRB1  0.0279    1157.  0.0760    1909   0.807     954
##  9 Monoc… Monoc… HLA-B   LILRB2  0.0287    1189.  0.0852    1656   0.748    1079
## 10 B cel… Monoc… HLA-B   LILRB1  0.0393     962.  0.0805    1775   0.873     820
## # … with 62 more rows, 16 more variables: logfc.logfc_comb <dbl>,
## #   logfc.rank <dbl>, sca.LRscore <dbl>, sca.rank <dbl>,
## #   cellphonedb.pvalue <dbl>, cellphonedb.rank <dbl>, auroc <dbl>, aupr <dbl>,
## #   aupr_corrected <dbl>, pearson <dbl>, rank_Nichenet <dbl>,
## #   logFC_ligand <dbl>, pval_ligand <dbl>, adj_pvals <dbl>,
## #   logFC_receptor <dbl>, pval_receptor <dbl>, and abbreviated variable names
## #   ¹​ligand.complex, ²​receptor.complex, ³​aggregate_rank, ⁴​mean_rank, …

Above, we ran over all the NicheNet analysis steps one-by-one. In practice, we also recommend users to first build a step-by-step pipeline of NicheNet. But for future easier use, you can make a wrapper around your pipeline. We did this here as well for this case study and will now quickly run the NicheNet for the other cell types.

NicheNet analysis with wrapper

source("src/nichenet_wrapper.R")
nichenet_output_Tcells <- nichenet_covid19_wrapper(receiver = "T cells",
                                                   DE_table = DE_table,
                                                   sobj = sobj,  liana_res = liana_res,
                                                   ligand_target_matrix = ligand_target_matrix)
## Scale for 'fill' is already present. Adding another scale for 'fill', which
## will replace the existing scale.
## Joining, by = "ligand.complex"
## Joining, by = c("source", "ligand.complex")
## Joining, by = c("target", "receptor.complex", "adj_pvals")
nichenet_output_Tcells$p_ligand_expression

nichenet_output_Tcells$p_ligand_target_network

nichenet_output_Tcells$p_liana_nichenet
## Warning: Removed 4 rows containing missing values (geom_point).

nichenet_output_Tcells$p_nichenet_liana
## Warning: Removed 4 rows containing missing values (geom_point).

nichenet_output_Tcells$liana_nichenet %>% filter(logFC_ligand >= 0.5 |  logFC_receptor >= 0.5) %>% filter(rank_Nichenet <= 30)
## # A tibble: 66 × 26
##    source target ligan…¹ recep…² aggre…³ mean_…⁴ natmi…⁵ natmi…⁶ conne…⁷ conne…⁸
##    <chr>  <chr>  <chr>   <chr>     <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
##  1 Monoc… T cel… LGALS1  CD69    8.10e-4    444   0.102     1248   1.12      452
##  2 Neutr… T cel… HLA-B   CD3D    1.58e-3    434.  0.152      634   0.710    1171
##  3 Neutr… T cel… HLA-B   CD3G    1.58e-3    487.  0.141      728   0.683    1232
##  4 Monoc… T cel… HLA-B   CD3D    2.14e-3    620.  0.138      765   0.587    1516
##  5 Monoc… T cel… HLA-B   CD3G    2.88e-3    721   0.129      888   0.560    1608
##  6 Monoc… T cel… LGALS1  ITGB1   4.29e-3   1013.  0.0468    3135   0.904     770
##  7 NK ce… T cel… HLA-B   CD3D    8.11e-3    419.  0.160      573   0.787     988
##  8 NK ce… T cel… HLA-B   CD3G    1.01e-2    487.  0.149      652   0.760    1046
##  9 Neutr… T cel… HLA-B   CD8B    1.43e-2    961.  0.142      723   0.429    2216
## 10 Neutr… T cel… HLA-B   CD8A    1.65e-2   1143.  0.0887    1554   0.417    2281
## # … with 56 more rows, 16 more variables: logfc.logfc_comb <dbl>,
## #   logfc.rank <dbl>, sca.LRscore <dbl>, sca.rank <dbl>,
## #   cellphonedb.pvalue <dbl>, cellphonedb.rank <dbl>, auroc <dbl>, aupr <dbl>,
## #   aupr_corrected <dbl>, pearson <dbl>, rank_Nichenet <dbl>,
## #   logFC_ligand <dbl>, pval_ligand <dbl>, adj_pvals <dbl>,
## #   logFC_receptor <dbl>, pval_receptor <dbl>, and abbreviated variable names
## #   ¹​ligand.complex, ²​receptor.complex, ³​aggregate_rank, ⁴​mean_rank, …

Limitations and Assumptions

One should bear in mind that CCC inference from single-cell transcriptomics data comes with a number of assumptions and limitations, particularly since there are multiple steps between expression and the actual occurrence of CCC event, since:

Gene expression is a only proxy of protein expression, activity, and secretion

At the same time we also make the assumptions of: - Well mixed cell populations, hence physical distance between cell clusters and transport mechanisms (e.g. ligand diffusion) are neglected - CCC also is often solely explained by gene expression, and hence via proteins, neglecting metabolic and mRNA exchange - CCC inferred at the cluster level is meaningful

## Putting it all together

Q: What similarities and difference in biological interpretation do you observe between de diferent analyses you performed (footprint/enrichment, ligand-receptor prediction, NicheNet analysis)? Can you explain some of the differences?

References

Browaeys, R., Saelens, W. and Saeys, Y., 2020. NicheNet: modeling intercellular communication by linking ligands to target genes. Nature methods, 17(2), pp.159-162.

Butler, A., Hoffman, P., Smibert, P., Papalexi, E. and Satija, R., 2018. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nature biotechnology, 36(5), pp.411-420.

Cabello-Aguilar, S., Alame, M., Kon-Sun-Tack, F., Fau, C., Lacroix, M. and Colinge, J., 2020. SingleCellSignalR: inference of intercellular networks from single-cell transcriptomics. Nucleic Acids Research, 48(10), pp.e55-e55.

Dimitrov, D., Türei, D., Garrido-Rodriguez, M., Burmedi, P.L., Nagai, J.S., Boys, C., Ramirez Flores, R.O., Kim, H., Szalai, B., Costa, I.G. and Valdeolivas, A., 2022. Comparison of methods and resources for cell-cell communication inference from single-cell RNA-Seq data. Nature Communications, 13(1), pp.1-13.

Hou, R., Denisenko, E., Ong, H.T., Ramilowski, J.A. and Forrest, A.R., 2020. Predicting cell-to-cell communication networks using NATMI. Nature communications, 11(1), pp.1-11.

Kolde, R., Laur, S., Adler, P. and Vilo, J., 2012. Robust rank aggregation for gene list integration and meta-analysis. Bioinformatics, 28(4), pp.573-580.